Week 4 Live Session

Preparing for Production

Dr Zak Varty

Session Plan

  1. Week in review
  2. Discussion of activities
  3. Code profiling exercise

Optional

  1. Reading Group- LIME

Week In Review

Topic Summary

Reproducibility

  • Reproduction: your data and code
  • Replication: generalise to new data
  • Stochastic methods

Explainability

  • Local / Global, Conditional Marginal
  • Even simple models can be tricky
  • Meta-models (LIME and friends)

Scaling

  • Frequency of use and size of data
  • Profiling with {tictoc} and {profvis}
  • Profiling linked to debugging.

Task Review


Core

  • Read the LIME paper, which we will discuss during the live session.

  • Work through the understanding LIME R tutorial.

  • Use code profiling tools to assess the performance of your rolling_mean() and rolling_sd() functions. Identify any efficiencies that can be made.

Task Review


Bonus

Simulate a HPP \(\{X_1, \ldots, X_N\}\) on \([0,T]\) in two ways:

  • First use \(X_{i+1} - X_{i} \sim \text{Exp}(\lambda)\).

  • Second use \(N \sim \text{Pois}(\lambda)\) and \(X_i | N = n \overset{\mathrm{i.i.d}}{\sim} \text{Unif}(0, T)\).

Evaluate and compare the reproducibility and scalability of each implementation.

Discussion of Tasks

Where did you identify bottle necks in your rolling_mean() and rolling_sd() functions?

04:00

Profiling rolling_mean()

Calculate weekly mean on 1 year of data:


Console
source("src/pad-with-NAs.R")
source("src/rolling-mean.R")
sim_data <- 1:356
profvis::profvis(rolling_mean(sim_data, window_width = 7))


Error in parse_rprof_lines(lines, expr_source) : 
      No parsing data available. Maybe your function was too fast?

Profiling rolling_mean()

Solutions:

  • {microbenchmark} if you are really interested in instances of this size.
  • Scale up to a larger example
Console
source("pad-with-NAs.R")
source("rolling-mean.R")
sim_data <- 1:36500
profvis::profvis(rolling_mean(x = sim_data, window_width = 7))

Profiling

What did / could you do to resolve those bottlenecks?

04:00

What did / could you do to resolve those bottlenecks?


  • Drop and append to reduce copying

  • Use data structure / language that allows modification in place

  • Switch to “Online” rolling mean estimate

  • Any other ideas not already mentioned?

Profiling Exercises

Example - Reinventing the wheel

Compare the time it takes to sum the rows of a matrix using a for loop versus using the built-in function rowSums().

n_rows <- 1e6
n_cols <- 2

# create a matrix with random entries
X = matrix(data = rnorm(n_rows * n_cols), ncol = 2)


system.time({
  row_sums_X <- rep(NA, nrow(X))
  
  for (i in seq_along(row_sums_X)) {
    row_sums_X[i] <- sum(X[i, ])
  }
})
   user  system elapsed 
  0.816   0.014   0.835 

Example 1

# For loop
system.time({
  row_sums_X <- rep(NA, nrow(X))
  
  for (i in seq_along(row_sums_X)) {
    row_sums_X[i] <- sum(X[i, ])
  }
})
   user  system elapsed 
  0.803   0.012   0.823 


# Using built in function
system.time( 
    row_sums_X <- rowSums(X)
)
   user  system elapsed 
  0.012   0.005   0.018 

Task 1 - Check the Docs

For Loop:

   user  system elapsed 
  0.880   0.014   0.915 

Built in function from {Matrix}:

   user  system elapsed 
  0.010   0.002   0.011 

Task 1

The output is three values for ‘user’ ‘system’ and ‘elapsed’. Using the documentation for system.time() investigate what these mean.

Extension: What happens if we consider a “wide” or “square-ish” matrix, rather than a “tall” matrix.

03:00

Solution 1

system.time() calls the function proc.time(), evaluates expr, and then calls proc.time() once more, returning the difference between the two proc.time() calls.” - - system.time {base}

proc.time() returns five elements for backwards compatibility, but its print method prints a named vector of length 3. The first two entries are the total user and system CPU times of the current R process and any child processes on which it has waited, and the third entry is the ‘real’ elapsed time since the process was started.” - proc.time, {base}


Note:

“Timings of evaluations of the same expression can vary considerably depending on whether the evaluation triggers a garbage collection. When gcFirst = TRUE a garbage collection (gc) will be performed immediately before the evaluation of expr. This will usually produce more consistent timings.” - system.time {base}

Task 2 - A Different Approach

Investigate whether using X[i, 1] + X[i, 2] rather than sum(X[i, ]) significantly alters any of these values.

05:00

Solution 2

row_sums_X = rep(NA,1000000) 

system.time( 
  gcFirst = TRUE,
  for (i in 1:nrow(X)) {
    row_sums_X[i] <- sum(X[i, ])
  }
)
   user  system elapsed 
  0.820   0.017   0.843 
row_sums_X = rep(NA,1000000) 

system.time(
  gcFirst = TRUE,
  for (i in 1:nrow(X)) {
    row_sums_X[i] <- X[i, 1] + X[i, 2]
  }
)
   user  system elapsed 
  0.098   0.001   0.104 

In this case, calling the sum() function is less efficient than doing two look ups and an addition within R.

This is interesting because it goes against the standard advice that vectorised or compiled code will generally be faster.

Task 3 - Generating Random Variates

Below are three ways to create a vector of Gaussian random variates. Measure how long each approach takes and explain the ordering that you find.

random_vals <- NULL
for (i in 1:10000) {
  random_vals <- c(random_vals, rnorm(n = 1, mean = i, sd = 1))
}


random_vals <- rep(NA,10000)
for (i in 1:10000) {
  random_vals[i] <- rnorm(n = 1, mean = i, sd = 1)
}


random_vals <- rnorm(n = 10000, mean = 1:10000, sd = 1)
08:00

Solution 3

Let’s simplify things by defining each of these as a function.


random_vals_grow <- function(){
  random_vals <- NULL
  for (i in 1:10000) {
    random_vals <- c(random_vals, rnorm(n = 1, mean = i, sd = 1))
  }
}


random_vals_preallocate <- function(){
  random_vals <- rep(NA,10000)
  for (i in 1:10000) {
    random_vals[i] <- rnorm(n = 1, mean = i, sd = 1)
  }
}


random_vals_vector <- function(){
  random_vals <- rnorm(n = 10000, mean = 1:10000, sd = 1)
}

Solution 3


library("tictoc")

tic()
random_vals_grow()
toc()
0.458 sec elapsed
tic()
random_vals_preallocate()
toc()
0.034 sec elapsed
tic()
random_vals_vector()
toc()
0.002 sec elapsed

Solution 3

  • Preallocation improves over growing the vector because it allows modification in place. For a detailed discussion of R’s copy on modify behaviour see Advanced R Chapter 2.

  • The vectorised code essentially moves the for loop to C. This is not only a faster language but also reduces the number of call that have to be made to C functions.

Task 4 Simple Earthquake Simulation

The following code will simulate a homogeneous Poisson process on the interval \([0,t_{\text{max}}]\) with rate \(\lambda = 0.2\).


Each event has an mark associated with it, drawn from a shifted exponential distribution.


t_max <- 100 
lambda <- 0.2 

event_times <- c()
event_marks <- c() 
t <- 0

time_to_first_event <- rexp(n = 1, rate = lambda)
t <- t + time_to_first_event

while (t < t_max) {
  event_times <- c(event_times, t)
  event_marks <- c(event_marks, 3 + rexp(n = 1, rate = 1))
  time_to_next_event <- rexp(n = 1, rate = lambda)
  t <- t + time_to_next_event
}

point_pattern <- data.frame(event_times, event_marks)

Task 4.1 Simple Earthquake Simulation Function

Task 4.1: Convert this code to a function.


t_max <- 100 
lambda <- 0.2 

event_times <- c()
event_marks <- c() 
t <- 0

time_to_first_event <- rexp(n = 1, rate = lambda)
t <- t + time_to_first_event

while (t < t_max) {
  event_times <- c(event_times, t)
  event_marks <- c(event_marks, 3 + rexp(n = 1, rate = 1))
  time_to_next_event <- rexp(n = 1, rate = lambda)
  t <- t + time_to_next_event
}

point_pattern <- data.frame(event_times, event_marks)
03:00

Task 4.1 Simple Earthquake Simulation Function

Solution 4.1


sim_hpp <- function(lambda = 0.2, t_max = 100, offset = 3){
  
  event_times <- c()
  event_marks <- c() 
  t <- 0 

  time_to_first_event <- rexp(n = 1, rate = lambda)
  t <- t + time_to_first_event
  
  while (t < t_max) {
    event_times <- c(event_times, t)
    event_marks <- c(event_marks, offset + rexp(n = 1, rate = 1))
    time_to_next_event <- rexp(n = 1, rate = lambda)
    t <- t + time_to_next_event
  }
  
  point_pattern <- data.frame(time = event_times, mark = event_marks)
  
  # return the data.frame so that it can be stored, but do not print to console
  invisible(point_pattern)
}

Task 4.2: Investigating Scalability

Task 4.2: Create a plot showing how the runtime of this code changes as \(t_{\text{max}}\) increases.

05:00

Solution 4.2

Time the code:

t_max_values <- seq(5000, 50000, by = 1000)
run_times <- rep(NA, length(t_max_values))

for (run in seq_along(t_max_values)) {
  
  tic(quiet = TRUE)
  sim_hpp(t_max = t_max_values[run])
  timer <- toc(quiet = TRUE)

  run_times[run] <- timer$toc - timer$tic
  #print(paste("Completed run with t_max = ", t_max_values[run]))
}

Solution 4.2

Create the plot:

plot(
  x = t_max_values,
  y = run_times,
  pch = 16,
  ylab = "Runtime (seconds)", 
  xlab = "t_max")

Solution 4.2

Task 4.3 - Code Profiling

Simulate an earthquake catalogue on [0, 100000] with rate \(\lambda= 0.2\).


Use profvis::profvis() to identify bottlenecks in your code.


Note: We need to remember to source the function from a separate script so that we get a line-by-line breakdown.

05:00

Solution 4.3

library(profvis)
source("sim_hpp.R")
profvis::profvis(sim_hpp(t_max = 100000))

Task 4.4: Refactoring

Rewrite the code to be more efficient and create a second plot comparing the runtime of your new function to the original.

10:00

Solution 4.4

Solving this bottleneck is complicated by the fact that we do not know how many observations we will have before running the code.


We know that our total number of events \(N \sim \text{Poisson}(\lambda t_\text{max}\)), this means we can (with very low probability) have arbitrarily large number of events being simulated.

One solution is to pre-allocate storage that needs to be extended on only 1 in 1000 runs of the function and which further extends that storage as required.

Solution 4.4


sim_hpp_preallocated <- function(lambda = 0.2, t_max = 100, offset = 3){
 
initial_storage_size <- qpois(p = 0.999, lambda = lambda * t_max)
event_times <- rep(NA, initial_storage_size)
event_marks <- rep(NA, initial_storage_size) 
t <- 0 

time_to_first_event <- rexp(n = 1, rate = lambda)
t <- t + time_to_first_event

event_count <- 0

while (t < t_max) {
  # increment event counter and record details of this event
  event_count <- event_count + 1
  event_times[event_count] <- t
  event_marks[event_count] <- offset + rexp(n = 1, rate = 1)
  
  # calculate time of next event
  time_to_next_event <- rexp(n = 1, rate = lambda)
  t <- t + time_to_next_event
}

hpp <- data.frame(event_times, event_marks)

# keep only rows that have been filled in
hpp[!is.na(hpp$event_times), ]
}

Solution 4.4

We can then compare the time it takes to run each function as \(t_{\text{max}}\) increases.

Solution 4.4

Create the plot of runtimes

plot(
  x = t_max_values,
  y = run_times_1,
  xlab = "t_max", 
  ylab = "runtime (seconds)", 
  pch = 16)
points(x = t_max_values, y = run_times_2, col = "red", pch = 16)
legend(
  "topleft",
  legend = c("Growing vectors", "Conservative pre-allocation"),
  col = 1:2,
  pch = c(16,16), 
  bty = "n")

Solution 4.4

Task 4.5: Work smarter, not harder

Often using different data structures or simulation techniques is much more effective than trying to wring every last drop of efficiency from your current implementation.

Task 4.5: compare this to an implementation that uses the Poisson distribution of the total event count to first simulate the number of events and then randomly allocates locations over the interval.

10:00

Solution 4.5: Function definintion


We can compare runtime in a similar way, and find that this scales even more favourably than our previous approach.

Solution 4.5: Method Comparison

run_times_3 <- rep(NA, length(t_max_values))

for (run in seq_along(t_max_values)) {
  
  set.seed(1234)
  
  tic(quiet = TRUE)
  sim_3 <- sim_hpp_conditional(t_max = t_max_values[run])
  timer_3 <- toc(quiet = TRUE)
  
  #print(paste("Completed run with t_max = ", t_max_values[run]))
  
  run_times_3[run] <- timer_3$toc - timer_3$tic
}

Solution 4.5: Method Comparison Plot

Solution 4.5:

Note: It is no longer possible to ensure that the same number of points are generated in each iteration, because we are not generating random variates in the same order. We might want to further investigate whether this improvement is stable across many realisations.


Question: What is going on with \(t_{\text{max}} = 5000\)?

Summary


  1. Profiling is a useful tool for identifying code bottlenecks that become large problems in production.

  2. Tips

  • Avoid growing objects
  • Built in functions come pre-optimised
  • Vectorisation is (usually) your friend
  • Preallocate memory where possible
  • Paralellelisation and interfacing are a last resort.
  1. Often you can work smarter with better algorithms or data structures (but sometimes you just have to grin and bear it.)

Bonus discussion